Protein structure determination using metagenome sequence data.

نویسندگان

  • Sergey Ovchinnikov
  • Hahnbeom Park
  • Neha Varghese
  • Po-Ssu Huang
  • Georgios A Pavlopoulos
  • David E Kim
  • Hetunandan Kamisetty
  • Nikos C Kyrpides
  • David Baker
چکیده

Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes ...

متن کامل

Annotation of metagenome short reads using proxygenes

MOTIVATION A typical metagenome dataset generated using a 454 pyrosequencing platform consists of short reads sampled from the collective genome of a microbial community. The amount of sequence in such datasets is usually insufficient for assembly, and traditional gene prediction cannot be applied to unassembled short reads. As a result, analysis of such datasets usually involves comparisons in...

متن کامل

Sequence analysis UProC: tools for ultra-fast protein domain classification

Motivation: With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics. Results: The ultrafast protein classification (UProC) toolbox implements a novel algorithm (‘Mosaic Matching’) for large-scale sequence analysis. UProC is by three orders of magnitude faster than ...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Design and Production of Recombinant TAT Protein Structure, Catalytic Domain of Diphtheria Toxin, and Evaluation of Its Effect on Cell Line

Background and Objectives: Cancer is one of the most deadly diseases in the present age and its conventional therapies have had low success. Toxin therapy of cancer is a new therapeutic approach, which has attracted the attention of pharmaceutical specialists. Diphtheria toxin consists of three functional, transducing, and binding domains, that the functional part inhibits protein synthesis and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Science

دوره 355 6322  شماره 

صفحات  -

تاریخ انتشار 2017